Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Abstract In the biomedical domain, taxonomies organize the acquisition modalities of scientific images in hierarchical structures. Such taxonomies leverage large sets of correct image labels and provide essential information about the importance of a scientific publication, which could then be used in biocuration tasks. However, the hierarchical nature of the labels, the overhead of processing images, the absence or incompleteness of labelled data and the expertise required to label this type of data impede the creation of useful datasets for biocuration. From a multi‐year collaboration with biocurators and text‐mining researchers, we derive an iterative visual analytics and active learning (AL) strategy to address these challenges. We implement this strategy in a system called BI‐LAVA—Biocuration with Hierarchical Image Labelling through Active Learning and Visual Analytics. BI‐LAVA leverages a small set of image labels, a hierarchical set of image classifiers and AL to help model builders deal with incomplete ground‐truth labels, target a hierarchical taxonomy of image modalities and classify a large pool of unlabelled images. BI‐LAVA's front end uses custom encodings to represent data distributions, taxonomies, image projections and neighbourhoods of image thumbnails, which help model builders explore an unfamiliar image dataset and taxonomy and correct and generate labels. An evaluation with machine learning practitioners shows that our mixed human–machine approach successfully supports domain experts in understanding the characteristics of classes within the taxonomy, as well as validating and improving data quality in labelled and unlabelled collections.more » « lessFree, publicly-accessible full text available February 1, 2026
- 
            Biocuration is the process of analyzing biological or biomedical articles to organize biological data into data repositories using taxonomies and ontologies. Due to the expanding number of articles and the relatively small number of biocurators, automation is desired to improve the workflow of assessing articles worth curating. As figures convey essential information, automatically integrating images may improve curation. In this work, we instantiate and evaluate a first-in-kind, hybrid image+text document search system for biocuration. The system, MouseScholar, leverages an image modality taxonomy derived in collaboration with biocurators, in addition to figure segmentation, and classifiers components as a back-end and a streamlined front-end interface to search and present document results. We formally evaluated the system with ten biocurators on a mouse genome informatics biocuration dataset and collected feedback. The results demonstrate the benefits of blending text and image information when presenting scientific articles for biocuration.more » « less
- 
            Abstract MotivationFigures in biomedical papers communicate essential information with the potential to identify relevant documents in biomedical and clinical settings. However, academic search interfaces mainly search over text fields. ResultsWe describe a search system for biomedical documents that leverages image modalities and an existing index server. We integrate a problem-specific taxonomy of image modalities and image-based data into a custom search system. Our solution features a front-end interface to enhance classical document search results with image-related data, including page thumbnails, figures, captions and image-modality information. We demonstrate the system on a subset of the CORD-19 document collection. A quantitative evaluation demonstrates higher precision and recall for biomedical document retrieval. A qualitative evaluation with domain experts further highlights our solution’s benefits to biomedical search. Availability and implementationA demonstration is available at https://runachay.evl.uic.edu/scholar. Our code and image models can be accessed via github.com/uic-evl/bio-search. The dataset is continuously expanded.more » « less
- 
            This work proposes a domain-informed neural network architecture for experimental particle physics, using particle interaction localization with the time-projection chamber (TPC) technology for dark matter research as an example application. A key feature of the signals generated within the TPC is that they allow localization of particle interactions through a process called reconstruction (i.e., inverse-problem regression). While multilayer perceptrons (MLPs) have emerged as a leading contender for reconstruction in TPCs, such a black-box approach does not reflect prior knowledge of the underlying scientific processes. This paper looks anew at neural network-based interaction localization and encodes prior detector knowledge, in terms of both signal characteristics and detector geometry, into the feature encoding and the output layers of a multilayer (deep) neural network. The resulting neural network, termed Domain-informed Neural Network (DiNN), limits the receptive fields of the neurons in the initial feature encoding layers in order to account for the spatially localized nature of the signals produced within the TPC. This aspect of the DiNN, which has similarities with the emerging area of graph neural networks in that the neurons in the initial layers only connect to a handful of neurons in their succeeding layer, significantly reduces the number of parameters in the network in comparison to an MLP. In addition, in order to account for the detector geometry, the output layers of the network are modified using two geometric transformations to ensure the DiNN produces localizations within the interior of the detector. The end result is a neural network architecture that has 60% fewer parameters than an MLP, but that still achieves similar localization performance and provides a path to future architectural developments with improved performance because of their ability to encode additional domain knowledge into the architecture.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
